-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ImageStat.Stat.extrema #7593
Optimize ImageStat.Stat.extrema #7593
Conversation
The optimzed function improves the performance. The original function always runs through the complete historgram of length 256 even if it is possible to exit the loop early (break). Running some tests I found performance improvements of factor >10 depending on the image. Signed-off-by: Andreas Florath <[email protected]>
Signed-off-by: Andreas Florath <[email protected]>
Hmmm. |
I restarted the AppVeyor job, let's see how it goes. Also approved the GitHub Actions to run. Please could you share the performance tests you ran and their results? |
Yes, the test failures are also happening in main. I've created #7594 to fix them. |
I created a virtualenv with pillow and added the new function side by side with the original, like: def _getextrema_orig(self):
"""Get min/max values for each band in the image"""
def minmax(histogram):
n = 255
x = 0
for i in range(256):
if histogram[i]:
n = min(n, i)
x = max(x, i)
return n, x # returns (255, 0) if there's no data in the histogram
v = []
for i in range(0, len(self.h), 256):
v.append(minmax(self.h[i:]))
return v
def _getextrema(self):
"""Get min/max values for each band in the image"""
def minmax(histogram):
res_min, res_max = 255, 0
for i in range(256):
if histogram[i]:
res_min = i
break
for i in range(255, -1, -1):
if histogram[i]:
res_max = i
break
if res_max >= res_min:
return res_min, res_max
else:
return (255, 0)
v = []
for i in range(0, len(self.h), 256):
v.append(minmax(self.h[i:i+256]))
return v Then I used a collection of images (ImageNet) and run a script to compare the original and the optimized function. I adapted the script slightly so that you can run the test also on the Pillow test images (have to add the try/except block here because there are some broken images in the Pillow test image directory). BTW: this also checks if the original and optimized versions return the same results. import pathlib
import timeit
from PIL import Image, ImageStat
IMAGEDIR="../Pillow/Tests/images"
testdir = pathlib.Path(IMAGEDIR)
NUMBER=1000
REPEAT=10
for image_file_name in testdir.rglob("*"):
# Skip broken images
try:
img = Image.open(image_file_name)
stat = ImageStat.Stat(img)
except Exception as ex:
continue
# Check for correctness
res_orig = stat._getextrema_orig()
res_opt = stat._getextrema()
assert res_orig == res_opt
# Measure improvement factor
exec_times_orig = timeit.repeat(
stmt=stat._getextrema_orig, repeat=REPEAT, number=NUMBER)
exec_times_opt = timeit.repeat(
stmt=stat._getextrema, repeat=REPEAT, number=NUMBER)
print("%10.4f - %s" % (
min(exec_times_orig) / min(exec_times_opt), image_file_name)) For the ImageNet about 80-90% of the images had speedup factor of more than 10. A lot even more than factor 40.
Please note that the first number is the speedup-factor: the factor how much faster the proposed function is measured against the original. |
Simplification of return statement Co-authored-by: Andrew Murray <[email protected]>
Thanks a lot for the hint! |
src/PIL/ImageStat.py
Outdated
|
||
v = [] | ||
for i in range(0, len(self.h), 256): | ||
v.append(minmax(self.h[i:])) | ||
v.append(minmax(self.h[i : i + 256])) | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also use list comprehension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was to have a minimal patch set.
If you want, we could go with
return [minmax(self.h[i:i+256]) for i in range(0, len(self.h), 256)]
My tests showed that from the performance point of view the two solutions are mostly the same. But if my other patch gets accepted #7599 the list comprehension might be the way to go to have the same style.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#7597 has actually just been merged to use list comprehension here. Would you like to resolve the conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Let's also mention this and the other one in the release notes (there'll be a template once #6486 is merged). Can be in a followup PR. |
I've created #7605 to include this in the release notes. |
The optimized function improves the performance. The original function always runs through the complete histogram of length 256 even if it is possible to exit the loop early (break).
Running performance tests improvements of factor >10 were measured depending on the images under test.
Changes proposed in this pull request: